Method of and apparatus for data aggregation utilizing a multidimensdional database and multi-stage data aggregation operations

ABSTRACT

An improved method of and apparatus for aggregating data including a scalable multidimensional database (MDDB) storing multidimensional data logically organized along N dimensions and a high performance aggregation engine that performs multi-stage data aggregation operations on the multidimensional data. A first stage of such data aggregation operations is performed along a first dimension of the N dimensions; and a second stage of such data aggregation operations is performed for a given slice in the first dimension along another dimension of the N dimensions. Such multi-stage data aggregation operations achieve a significant increase in system performance (e.g. deceased access/search time). The MDDB and high performance aggregation engine of the present invention may be integrated into a standalone data aggregation server supporting an OLAP system (one or more OLAP servers and clients), or may be integrated into a database management system (DBMS), thus achieving improved user flexibility and ease of use. The improved DBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system, operational database system, or the like.

RELATED CASES

[0001] This is a Continuation of copending U.S. application Ser. No.09/796,098 entitled “Data Aggregation Server For Managing AMulti-Dimensional Database and Database Management System Having DataAggregation Server Integrated Therein” filed Feb. 28, 2001 which is aContinuation-in-part of: copending U.S. application Ser. No. 09/514,611entitled “Stand-Alone Cartridge-Style Data Aggregation Server And Methodof And System For Managing Multi-Dimensional Databases using the Same”,filed Feb. 28, 2000, and U.S. application Ser. No. 09/634,748 entitled“Relational Database Management System Having Integrated Non-RelationalMulti-Dimensional Data Store of Aggregated Data Elements” filed Aug. 9,2000; each said Application being commonly owned by HyperRoll Israel,Limited, and incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

[0002] 1. Field of Invention

[0003] The present invention relates to a method of and system foraggregating data elements in a multi-dimensional database (MDDB)supported upon a computing platform and also to provide an improvedmethod of and system for managing data elements within a MDDB duringon-line analytical processing (OLAP) operations and as an integral partof a database management system.

[0004] 2. Brief Description Of The State Of The Art

[0005] The ability to act quickly and decisively in today's increasinglycompetitive marketplace is critical to the success of organizations. Thevolume of information that is available to corporations is rapidlyincreasing and frequently overwhelming. Those organizations that willeffectively and efficiently manage these tremendous volumes of data, anduse the information to make business decisions, will realize asignificant competitive advantage in the marketplace.

[0006] Data warehousing, the creation of an enterprise-wide data store,is the first step towards managing these volumes of data. The DataWarehouse is becoming an integral part of many information deliverysystems because it provides a single, central location where areconciled version of data extracted from a wide variety of operationalsystems is stored. Over the last few years, improvements in price,performance, scalability, and robustness of open computing systems havemade data warehousing a central component of Information Technology CITstrategies. Details on methods of data integration and constructing datawarehouses can be found in the white paper entitled “Data Integration:The Warehouse Foundation” by Louis Rolleigh and Joe Thomas, published athttp://www.acxiom.com/whitepapers/wp-11.asp.

[0007] Building a Data Warehouse has its own special challenges (e.g.using common data model, common business dictionary, etc.) and is acomplex endeavor. However, just having a Data Warehouse does not provideorganizations with the often-heralded business benefits of datawarehousing. To complete the supply chain from transactional systems todecision maker, organizations need to deliver systems that allowknowledge workers to make strategic and tactical decisions based on theinformation stored in these warehouses. These decision support systemsare referred to as On-Line Analytical Processing (OLAP) systems. OLAPsystems allow knowledge workers to intuitively, quickly, and flexiblymanipulate operational data using familiar business terms, in order toprovide analytical insight into a particular problem or line of inquiry.For example, by using an OLAP system, decision makers can “slice anddice” information along a customer (or business) dimension, and viewbusiness metrics by product and through time. Reports can be definedfrom multiple perspectives that provide a high-level or detailed view ofthe performance of any aspect of the business. Decision makers cannavigate throughout their database by drilling down on a report to viewelements at finer levels of detail, or by pivoting to view reports fromdifferent perspectives. To enable such full-functioned businessanalyses, OLAP systems need to (1) support sophisticated analyses, (2)scale to large numbers of dimensions, and (3) support analyses againstlarge atomic data sets. These three key requirements are discussedfurther below.

[0008] Decision makers use key performance metrics to evaluate theoperations within their domain, and OLAP systems need to be capable ofdelivering these metrics in a usercustomizable format. These metrics maybe obtained from the transactional databases precalculated and stored inthe database, or generated on demand during the query process. Commonlyused metrics include:

[0009] (1) Multidimensional Ratios (e.g. Percent to Total)—“Show me thecontribution to weekly sales and category profit made by all items soldin the Northwest stores between July 1 and July 14.”

[0010] (2) Comparisons (e.g. Actual vs. Plan, This Period vs. LastPeriod)—“Show me the sales to plan percentage variation for this yearand compare it to that of the previous year to identify planningdiscrepancies.”

[0011] (3) Ranking and Statistical Profiles (e.g. Top N/Bottom N, 70/30,Quartiles)—“Show me sales, profit and average call volume per day for my20 most profitable salespeople, who are in the top 30% of the worldwidesales.”

[0012] (4) Custom Consolidations—“Show me an abbreviated incomestatement by quarter for the last two quarters for my Western Regionoperations.”

[0013] Knowledge workers analyze data from a number of differentbusiness perspectives or dimensions. As used hereinafter, a dimension isany element or hierarchical combination of elements in a data model thatcan be displayed orthogonally with respect to other combinations ofelements in the data model. For example, if a report lists sales byweek, promotion, store, and department, then the report would be a sliceof data taken from a four-dimensional data model.

[0014] Target marketing and market segmentation applications involveextracting highly qualified result sets from large volumes of data. Forexample, a direct marketing organization might want to generate atargeted mailing list based on dozens of characteristics, includingpurchase frequency, size of the last purchase, past buying trends,customer location, age of customer, and gender of customer. Theseapplications rapidly increase the dimensionality requirements foranalysis.

[0015] The number of dimensions in OLAP systems range from a feworthogonal dimensions to hundreds of orthogonal dimensions. Orthogonaldimensions in an exemplary OLAP application might include Geography,Time, and Products.

[0016] Atomic data refers to the lowest level of data granularityrequired for effective decision making. In the case of a retailmerchandising manager, “atomic data” may refer to information by store,by day, and by item. For a banker, atomic data may be information byaccount, by transaction, and by branch. Most organizations implementingOLAP systems find themselves needing systems that can scale to tens,hundreds, and even thousands of gigabytes of atomic information.

[0017] As OLAP systems become more pervasive and are used by themajority of the enterprise, more data over longer time frames will beincluded in the data store (i.e. data warehouse), and the size of thedatabase will increase by at least an order of magnitude. Thus, OLAPsystems need to be able to scale from present to near-future volumes ofdata.

[0018] In general, OLAP systems need to (1) support the complex analysisrequirements of decision-makers, (2) analyze the data from a number ofdifferent perspectives (i.e. business dimensions), and (3) supportcomplex analyses against large input (atomic-level) data sets from aData Warehouse maintained by the organization using a relationaldatabase management system (RDBMS).

[0019] Vendors of OLAP systems classify OLAP Systems as eitherRelational OLAP (ROLAP) or Multidimensional OLAP (MOLAP) based on theunderlying architecture thereof. Thus, there are two basic architecturesfor On-Line Analytical Processing systems: the ROLAP Architecture, andthe MOLAP architecture.

[0020] The Relational OLAP (ROLAP) system accesses data stored in a DataWarehouse to provide OLAP analyses. The premise of ROLAP is that OLAPcapabilities are best provided directly against the relational database,i.e. the Data Warehouse.

[0021] The ROLAP architecture was invented to enable direct access ofdata from Data Warehouses, and therefore support optimization techniquesto meet batch window requirements and provide fast response times.Typically, these optimization techniques include application-level tablepartitioning, pre-aggregate inferencing, denormalization support, andthe joining of multiple fact tables.

[0022] A typical prior art ROLAP system has a three-tier or layerclient/server architecture. The “database layer” utilizes relationaldatabases for data storage, access, and retrieval processes. The“application logic layer” is the ROLAP engine which executes themultidimensional reports from multiple users. The ROLAP engineintegrates with a variety of “presentation layers,” through which usersperform OLAP analyses.

[0023] After the data model for the data warehouse is defined, data fromon-line transaction-processing (OLTP) systems is loaded into therelational database management system (RDBMS). If required by the datamodel, database routines are run to pre-aggregate the data within theRDBMS. Indices are then created to optimize query access times. Endusers submit multidimensional analyses to the ROLAP engine, which thendynamically transforms the requests into SQL execution plans. The SQLexecution plans are submitted to the relational database for processing,the relational query results are cross-tabulated, and a multidimensionalresult data set is returned to the end user. ROLAP is a fully dynamicarchitecture capable of utilizing pre-calculated results when they areavailable, or dynamically generating results from atomic informationwhen necessary.

[0024] Multidimensional OLAP (MOLAP) systems utilize a proprietarymultidimensional database (MDDB) to provide OLAP analyses. The MDDB islogically organized as a multidimensional array (typically referred toas a multidimensional cube or hypercube or cube) whose rows/columns eachrepresent a different dimension (i.e., relation). A data value isassociated with each combination of dimensions (typically referred to asa “coordinate”). The main premise of this architecture is that data mustbe stored multidimensionally to be accessed and viewedmultidimensionally.

[0025] As shown in FIG. 1B, prior art MOLAP systems have an Aggregation,Access and Retrieval module which is responsible for all data storage,access, and retrieval processes, including data aggregration (i.e.preaggregation) in the MDDB. As shown in FIG. 1B, the base data loaderis fed with base data, in the most detailed level, from the DataWarehouse, into the Multi-Dimensional Data Base (MDDB). On top of thebase data, layers of aggregated data are built-up by the Aggregationprogram, which is part of the Aggregation, Access and Retrieval module.As indicated in this figure, the application logic module is responsiblefor the execution of all OLAP requests/queries (e.g. ratios, ranks,forecasts, exception scanning, and slicing and dicing) of data withinthe MDDB. The presentation module integrates with the application logicmodule and provides an interface, through which the end users view andrequest OLAP analyses on their client machines which may be web-enabledthrough the infrastructure of the Internet. The client/serverarchitecture of a MOLAP system allows multiple users to access the samemultidimensional database (MDDB).

[0026] Information (i.e. basic data) from a variety of operationalsystems within an enterprise, comprising the Data Warehouse, is loadedinto a prior art multidimensional database (MDDB) through a series ofbatch routines. The Express™ server by the Oracle Corporation isexemplary of a popular server which can be used to carry out the dataloading process in prior art MOLAP systems. As shown in FIG. 2B, anexemplary 3-D MDDB is schematically depicted, showing eography, time andproducts as the “dimensions” of the database. The multidimensional dataof the MDDB is logically organized in an array structure, as shown inFIG. 2C. Physically, the Express™ server stores data in pages (orrecords) of an information file. Pages contain 512, or 2048, or 4096bytes of data, depending on the platform and release of the Express™server. In order to look up the physical record address from thedatabase file recorded on a disk or other mass storage device, theExpress™ server generates a data structure referred to as a “PageAllocation Table (PAT)”. As shown in FIG. 2D, the PAT tells the Express™server the physical record number that contains the page of data.Typically, the PAT is organized in pages. The simplest way to access adata element in the MDDB is by calculating the “offset” using theadditions and multiplications expressed by a simple formula:

Offset=Months+Product*(# of_Months)+City*(# of_Months*# of_Products)

[0027] During an OLAP session, the response time of a multidimensionalquery on a prior art MDDB depends on how many cells in the MDDB have tobe added “on the fly”. As the number of dimensions in the MDDB increaseslinearly, the number of the cells in the MDDB increases exponentially.However, it is known that the majority of multidimensional queries dealwith summarized high level data. Thus, as shown in FIGS. 3A and 3B, oncethe atomic data (i.e. “basic data”) has been loaded into the MDDB, thegeneral approach is to perform a series of calculations in batch inorder to aggregate (i.e. pre-aggregate) the data elements along theorthogonal dimensions of the MDDB and fill the array structures thereof.For example, revenue figures for all retail stores in a particular state(i.e. New York) would be added together to fill the state level cells inthe MDDB. After the array structure in the database has been filled,integer-based indices are created and hashing algorithms are used toimprove query access times. Pre-aggregation of dimension D0 is alwaysperformed along the cross-section of the MDDB along the D0 dimension.

[0028] As shown in FIGS. 3C1 and 3C2, the raw data loaded into the MDDBis primarily organized at its lowest dimensional hierarchy, and theresults of the pre-aggregations are stored in the neighboring parts ofthe MDDB.

[0029] As shown in FIG. 3C2, along the TIME dimension, weeks are theaggregation results of days, months are the aggregation results ofweeks, and quarters are the aggregation results of months. While notshown in the figures, along the GEOGRAPHY dimension, states are theaggregation results of cities, countries are the aggregation results ofstates, and continents are the aggregation results of countries. Bypre-aggregating (i.e. consolidating or compiling) all logical subtotalsand totals along all dimensions of the MDDB, it is possible to carry outreal-time MOLAP operations using a multidimensional database (MDDB)containing both basic (i.e. atomic) and pre-aggregated data. Once thiscompilation process has been completed, the MDDB is ready for use. Usersrequest OLAP reports by submitting queries through the OLAP Applicationinterface (e.g. using web-enabled client machines), and the applicationlogic layer responds to the submitted queries by retrieving the storeddata from the MDDB for display on the client machine.

[0030] Typically, in MDDB systems, the aggregated data is very sparse,tending to explode as the number of dimension grows and dramaticallyslowing down the retrieval process (as described in the report entitled“Database Explosion: The OLAP Report”,http://www.olapreport.com/DatabaseExplosion.htm, incorporated herein byreference). Quick and on line retrieval of queried data is critical indelivering on-line response for OLAP queries. Therefore, the datastructure of the MDDB, and methods of its storing, indexing and handlingare dictated mainly by the need of fast retrieval of massive and sparsedata.

[0031] Different solutions for this problem are disclosed in thefollowing US Patents, each of which is incorporated herein by referencein its entirety:

[0032] U.S. Pat. No. 5,822,751 “Efficient Multidimensional DataAggregation Operator Implementation”

[0033] U.S. Pat. No. 5,805,885 “Method And System For AggregationObjects”

[0034] U.S. Pat. No. 5,781,896 “Method And System For EfficientlyPerforming Database Table Aggregation Using An Aggregation Index”

[0035] U.S. Pat. No. 5,745,764 “Method And System For AggregationObjects”

[0036] In all the prior art OLAP servers, the process of storing,indexing and handling MDDB utilize complex data structures to largelyimprove the retrieval speed, as part of the querying process, at thecost of slowing down the storing and aggregation. The query-boundedstructure, that must support fast retrieval of queries in a restrictingenvironment of high sparcity and multi-hierarchies, is not the optimalone for fast aggregation.

[0037] In addition to the aggregation process, the Aggregation, Accessand Retrieval module is responsible for all data storage, retrieval andaccess processes. The Logic module is responsible for the execution ofOLAP queries. The Presentation module intermediates between the user andthe logic module and provides an interface through which the end usersview and request OLAP analyses. The client/server architecture allowsmultiple users to simultaneously access the multidimensional database.

[0038] In summary, general system requirements of OLAP systems include:(1) supporting sophisticated analysis, (2) scaling to large number ofdimensions, and (3) supporting analysis against large atomic data sets.

[0039] MOLAP system architecture is capable of providing analyticallysophisticated reports and analysis functionality. However, requirements(2) and (3) fundamentally limit MOLAP's capability, because to beeffective and to meet end-user requirements, MOLAP databases need a highdegree of aggregation.

[0040] By contrast, the ROLAP system architecture allows theconstruction of systems requiring a low degree of aggregation, but suchsystems are significantly slower than systems based on MOLAP systemarchitecure principles. The resulting long aggregation times of ROLAPsystems impose severe limitations on its volumes and dimensionalcapabilities.

[0041] The graphs plotted in FIG. 5 clearly indicate the computationaldemands that are created when searching an MDDB during an OLAP session,where answers to queries are presented to the MOLAP system, and answersthereto are solicited often under real-time constraints. However, priorart MOLAP systems have limited capabilities to dynamically create dataaggregations or to calculate business metrics that have not beenprecalculated and stored in the MDDB.

[0042] The large volumes of data and the high dimensionality of certainmarket segmentation applications are orders of magnitude beyond thelimits of current multidimensional databases.

[0043] ROLAP is capable of higher data volumes. However, the ROLAParchitecture, despite its high volume and dimensionality superiority,suffers from several significant drawbacks as compared to MOLAP:

[0044] Full aggregation of large data volumes are very time consuming,otherwise, partial aggregation severely degrades the query response.

[0045] It has a slower query response

[0046] It requires developers and end users to know SQL

[0047] SQL is less capable of the sophisticated analytical functionalitynecessary for OLAP

[0048] ROLAP provides limited application functionality

[0049] Thus, improved techniques for data aggregation within MOLAPsystems would appear to allow the number of dimensions of and the sizeof atomic (i.e. basic) data sets in the MDDB to be significantlyincreased, and thus increase the usage of the MOLAP system architecture.

[0050] Also, improved techniques for data aggregation within ROLAPsystems would appear to allow for maximized query performance on largedata volumes, and reduce the time of partial aggregations that degradesquery response, and thus generally benefit ROLAP system architectures.

[0051] Thus, there is a great need in the art for an improved way of andmeans for aggregating data elements within a multi-dimensional database(MDDB), while avoiding the shortcomings and drawbacks of prior artsystems and methodologies.

[0052] Modern operational and informational database systems, asdescribed above, typically use a database management system (DBMS) (suchas an RDBMS system, object database system, or object/relationaldatabase system) as a repository for storing data and querying the data.FIG. 14 illustrates a data warehouse-OLAP domain that utilizes the priorart approaches described above. The data warehouse is an enterprise-widedata store. It is becoming an integral part of many information deliverysystems because it provides a single, central location wherein areconciled version of data extracted from a wide variety of operationalsystems is stored. Details on methods of data integration andconstructing data warehouses can be found in the white paper entitled“Data Integration: The Warehouse Foundation” by Louis Rolleigh and JoeThomas, published at http://www.acxiom.com/whitepapers/wp-11.asp.

[0053] Building a Data Warehouse has its own special challenges (e.g.using common data model, common business dictionary, etc.) and is acomplex endeavor. However, just having a Data Warehouse does not provideorganizations with the often-heralded business benefits of datawarehousing. To complete the supply chain from transactional systems todecision maker, organizations need to deliver systems that allowknowledge workers to make strategic and tactical decisions based on theinformation stored in these warehouses. These decision support systemsare referred to as On-Line Analytical Processing (OLAP) systems. SuchOLAP systems are commonly classified as Relational OLAP systems orMulti-Dimensional OLAP systems as described above.

[0054] The Relational OLAP (ROLAP) system accesses data stored in arelational database (which is part of the Data Warehouse) to provideOLAP analyses. The premise of ROLAP is that OLAP capabilities are bestprovided directly against the relational database. The ROLAParchitecture was invented to enable direct access of data from DataWarehouses, and therefore support optimization techniques to meet batchwindow requirements and provide fast response times. Typically, theseoptimization techniques include application-level table partitioning,pre-aggregate inferencing, denormalization support, and the joining ofmultiple fact tables.

[0055] As described above, a typical ROLAP system has a three-tier orlayer client/server architecture. The “database layer” utilizesrelational databases for data storage, access, and retrieval processes.The “application logic layer” is the ROLAP engine which executes themultidimensional reports from multiple users. The ROLAP engineintegrates with a variety of “presentation layers,” through which usersperform OLAP analyses. After the data model for the data warehouse isdefined, data from on-line transaction-processing (OLTP) systems isloaded into the relational database management system (RDBMS). Ifrequired by the data model, database routines are run to pre-aggregatethe data within the RDBMS. Indices are then created optimize queryaccess times. End users submit multidimensional analyses to the ROLAPengine, which then dynamically transforms the requests into SQLexecution plans. The SQL execution plans are submitted to the relationaldatabase for processing, the relational query results arecross-tabulated, and a multidimensional result data set is returned tothe end user. ROLAP is a fully dynamic architecture capable of utilizingpre-calculated results when they are available, or dynamicallygenerating results from the raw information when necessary.

[0056] The Multidimensional OLAP (MOLAP) systems utilize a proprietarymultidimensional database (MDDB) (or “cube”) to provide OLAP analyses.The main premise of this architecture is that data must be storedmultidimensionally to be accessed and viewed multidimensionally. SuchMOLAP systems provide an interface that enables users to query the MDDBdata structure such that users can “slice and dice” the aggregated data.As shown in FIG. 15, such MOLAP systems have an aggregation engine whichis responsible for all data storage, access, and retrieval processes,including data aggregation (i.e. pre-aggregation) in the MDDB, and ananalytical processing and GUI module responsible for interfacing with auser to provide analytical analysis, query input, and reporting of queryresults to the user. In a relational database, data is stored in tables.In contrast, the MDDB is a non-relational data structure—it uses otherdata structures, either instead of or in addition to tables—to storedata.

[0057] There are other application domains where there is a great needfor improved methods of and apparatus for carrying out data aggregationoperations. For example, modern operational and informational databasesrepresent such domains. As described above, modern operational andinformational databases typically utilize a relational database system(RDBMS) as a repository for storing data and querying data. FIG. 16Aillustrates an exemplary table in an RDBMS; and FIGS. 16B and 16Cillustrate operators (queries) on the table of FIG. 16A, and the resultof such queries, respectively. The operators illustrated in FIGS. 16Band 16C are expressed as Structured Query Language (SQL) statements asis conventional in the art.

[0058] The choice of using a RDBMS as the data repository in informationdatabase systems naturally stems from the realities of SQLstandardization, the wealth of RDBMS-related tools, and readilyavailable expertise in RDBMS systems. However, the querying component ofRDBMS technology suffers from performance and optimization problemsstemming from the very nature of the relational data model. Morespecifically, during query processing, the relational data modelrequires a mechanism that locates the raw data elements that match thequery. Moreover, to support queries that involve aggregation operations,such aggregation operations must be performed over the raw data elementsthat match the query. For large multidimensional databases, a naiveimplementation of these operations involves computational intensivetable scans that leads to unacceptable query response times.

[0059] In order to better understand how the prior art has approachedthis problem, it will be helpful to briefly describe the relationaldatabase model. According to the relational database model, a relationaldatabase is represented by a logical schema and tables that implementthe schema. The logical schema is represented by a set of templates thatdefine one or more dimensions (entities) and attributes associated witha given dimension. The attributes associated with a given dimensionincludes one or more attributes that distinguish it from every otherdimension in the database (a dimension identifier). Relationshipsamongst dimensions are formed by joining attributes. The data structurethat represents the set of templates and relations of the logical schemais typically referred to as a catalog or dictionary. Note that thelogical schema represents the relational organization of the database,but does not hold any fact data per se. This fact data is stored intables that implement the logical schema.

[0060] Star schemas are frequently used to represent the logicalstructure of a relational database. The basic premise of star schemas isthat information can be classified into two groups: facts anddimensions. Facts are the core data elements being analyzed. Forexample, units of individual item sold are facts, while dimensions areattributes about the facts. For example, dimensions are the producttypes purchased and the data purchase. Business questions against thisschema are asked looking up specific facts (UNITS) through a set ofdimensions (MARKETS, PRODUCTS, PERIOD). The central fact table istypically much larger than any of its dimension tables.

[0061] An exemplary star schema is illustrated in FIG. 17A for suppliers(the “Supplier” dimension) and parts (the “Parts” dimension) over timeperiods (the “Time-Period” dimension). It includes a central fact table“Supplied-Parts” that relates to multiple dimensions—the “Supplier”,“Parts” and “Time-Period” dimensions. FIG. 17B illustrates the tablesused to implement the star schema of FIG. 17A. More specifically, thesetables include a central fact table and a dimension table for eachdimension in the logical schema of FIG. 17A. A given dimension tablestores rows (instances) of the dimension defined in the logical schema.For the sake of description, FIG. 17B illustrates the dimension tablefor the “Time-Period” dimension only. Similar dimension tables for the“Supplier” and “Part” dimensions (not shown) are also included in suchan implementation. Each row within the central fact table includes amulti-part key associated with a set of facts (in this example, a numberrepresenting a quantity). The multi-part key of a given row (valuesstored in the S#, P#, TP# fields as shown) points to rows (instances)stored in the dimension tables described above. A more detaileddescription of star schemas and the tables used to implement starschemas may be found in C. J. Date, “An Introduction to DatabaseSystems,” Seventh Edition, Addison-Wesley, 2000, pp. 711-715, hereinincorporated by reference in its entirety.

[0062] When processing a query, the tables that implement the schema areaccessed to retrieve the facts that match the query. For example, in astar schema implementation as described above, the facts are retrievedfrom the central fact table and/or the dimension tables. Locating thefacts that match a given query involves one or more join operations.Moreover, to support queries that involve aggregation operations, suchaggregation operations must be performed over the facts that match thequery. For large multi-dimensional databases, a naive implementation ofthese operations involves computational intensive table scans thattypically leads to unacceptable query response times. Moreover, sincethe fact tables are pre-summarized and aggregated along businessdimensions, these tables tend to be very large. This point becomes animportant consideration of the performance issues associated with starschemas. A more detailed discussion of the performance issues (andproposed approaches that address such issues) related to joining andaggregation of star schema is now set forth.

[0063] The first performance issue arises from computationally intensivetable scans that are performed by a naive implementation of datajoining. Indexing schemes may be used to bypass these scans whenperforming joining operations. Such schemes include B-tree indexing,inverted list indexing and aggregate indexing. A more detaileddescription of such indexing schemes can be found in “The Art ofIndexing”, Dynamic Information Systems Corporation, October 1999,available at http://www.disc.com/artindex.pdf. All of these indexingschemes replaces table scan operations (involved in locating the dataelements that match a query) with one ore more index lookup operation.Inverted list indexing associates an index with a group of dataelements, and stores (at a location identified by the index) a group ofpointers to the associated data elements. During query processing, inthe event that the query matches the index, the pointers stored in theindex are used to retrieve the corresponding data elements pointedtherefrom. Aggregation indexing integrates an aggregation index with aninverted list index to provide pointers to raw data elements thatrequire aggregation, thereby providing for dynamic summarization of theraw data elements that match the user-submitted query.

[0064] These indexing schemes are intended to improve join operations byreplacing table scan operations with one or more index lookup operationin order to locate the data elements that match a query. However, theseindexing schemes suffer from various performance issues as follows:

[0065] Since the tables in the star schema design typically contain theentire hierarchy of attributes (e.g. in a PERIOD dimension, thishierarchy could be day>week>month>quarter>year), a multipart key of day,week, month, quarter, year has to be created; thus, multiple meta-datadefinitions are required (one of each key component) to define a singlerelationship; this adds to the design complexity, and sluggishness inperformance.

[0066] Addition or deletion of levels in the hierarchy will requirephysical modification of the fact table, which is time consuming processthat limits flexibility.

[0067] Carrying all the segments of the compound dimensional key in thefact table increases the size of the index, thus impacting bothperformance and scalability.

[0068] Another performance issue arises from dimension tables thatcontain multiple hierarchies. In such cases, the dimensional table oftenincludes a level of hierarchy indicator for every record. Everyretrieval from fact table that stores details and aggregates must usethe indicator to obtain the correct result, which impacts performance.The best alternative to using the level indicator is the snowflakeschema. In this schema aggregate tables are created separately from thedetail tables. In addition to the main fact tables, snowflake schemacontains separate fact tables for each level of aggregation. Notably,the snowflake schema is even more complicated than a star schema, andoften requires multiple SQL statements to get the results that arerequired.

[0069] Another performance issue arises from the pairwise join problem.Traditional RDBMS engines are not design for the rich set of complexqueries that are issued against a star schema. The need to retrieverelated information from several tables in a single query—“joinprocessing”—is severely limited. Many RDBMSs can join only two tables ata time. If a complex join involves more than two tables, the RDBMS needsto break the query into a series of pairwise joins. Selecting the orderof these joins has a dramatic performance impact. There are optimizersthat spend a lot of CPU cycles to find the best order in which toexecute those joins. Unfortunately, because the number of combinationsto be evaluated grows exponentially with the number of tables beingjoined, the problem of selecting the best order of pairwise joins rarelycan be solved in a reasonable amount of time.

[0070] Moreover, because the number of combinations is often too large,optimizers limit the selection on the basis of a criterion of directlyrelated tables. In a star schema, the fact table is the only tabledirectly related to most other tables, meaning that the fact table is anatural candidate for the first pairwise join. Unfortunately, the facttable is the very largest table in the query, so this strategy leads toselecting a pairwise join order that generates a very large intermediateresult set, severely affecting query performance.

[0071] There is an optimization strategy, typically referred to asCartesian Joins, that lessens the performance impact of the pairwisejoin problem by allowing joining of unrelated tables. The join to thefact table, which is the largest one, is deferred until the very end,thus reducing the size of intermediate result sets. In a join of twounrelated tables every combination of the two tables' rows is produced,a Cartesian product. Such a Cartesian product improves queryperformance. However, this strategy is viable only if the Cartesianproduct of dimension rows selected is much smaller than the number ofrows in the fact table. The multiplicative nature of the Cartesian joinmakes the optimization helpful only for relatively small databases.

[0072] In addition, systems that exploit hardware and softwareparallelism have been developed that lessens the performance issues setforth above. Parallelism can help reduce the execution time of a singlequery (speed-up), or handle additional work without degrading executiontime (scale-up).). For example, Red Brick™ has developed STARjoin™technology that provides high speed, parallelizable multi-table joins ina single pass, thus allowing more than two tables can be joined in asingle operation. The core technology is an innovative approach toindexing that accelerates multiple joins. Unfortunately, parallelism canonly reduce, not eliminate, the performance degradation issues relatedto the star schema.

[0073] One of the most fundamental principles of the multidimensionaldatabase is the idea of aggregation. The most common aggregation iscalled a roll-up aggregation. This type is relatively easy to compute:e.g. taking daily sales totals and rolling them up into a monthly salestable. The more difficult are analytical calculations, the aggregationof Boolean and comparative operators. However these are also consideredas a subset of aggregation.

[0074] In a star schema, the results of aggregation are summary tables.Typically, summary tables are generated by database administrators whoattempt to anticipate the data aggregations that the users will request,and then pre-build such tables. In such systems, when processing auser-generated query that involves aggregation operations, the pre-builtaggregated data that matches the query is retrieved from the summarytables (if such data exists). FIGS. 18A and 18B illustrate amulti-dimensional relational database using a star schema and summarytables. In this example, the summary tables are generated over the“time” dimension storing aggregated data for “month”, “quarter” and“year” time periods as shown in FIG. 18B. Summary tables are in essenceadditional fact tables, of higher levels. They are attached to the basicfact table creating a snowflake extension of the star schema. There arehierarchies among summary tables because users at different levels ofmanagement require different levels of summarization. Choosing the levelof aggregation is accomplished via the “drill-down” feature.

[0075] Summary tables containing pre-aggregated results typicallyprovide for improved query response time with respect to on-the-flyaggregation. However, summary tables suffer from some disadvantages:

[0076] summary tables require that database administrators anticipatethe data aggregation operations that users will require; this is adifficult task in large multi-dimensional databases (for example, indata warehouses and data mining systems), where users always need toquery in new ways looking for new information and patterns.

[0077] summary tables do not provide a mechanism that allows efficientdrill down to view the raw data that makes up the summarytable—typically a table scan of one or more large tables is required.

[0078] querying is delayed until pre-aggregation is completed.

[0079] there is a heavy time overhead because the vast majority of thegenerated information remains unvisited.

[0080] there is a need to synchronize the summary tables before the use.

[0081] the degree of viable parallelism is limited because thesubsequent levels of summary tables must be performed in pipeline, dueto their hierarchies.

[0082] for very large databases, this option is not valid because oftime and storage space.

[0083] Note that it is common to utilize both pre-aggregated results andon-the-fly aggregation in support aggregation. In these system, partialpre-aggregation of the facts results in a small set of summary tables.On-the-fly aggregation is used in the case the required aggregated datadoes not exist in the summary tables.

[0084] Note that in the event that the aggregated data does not exist inthe summary tables, table join operations and aggregation operations areperformed over the raw facts in order to generate such aggregated data.This is typically referred to as on-the-fly aggregation. In suchinstances, aggregation indexing is used to mitigate the performance ofmultiple data joins associated with dynamic aggregation of the raw data.Thus, in large multi-dimensional databases, such dynamic aggregation maylead to unacceptable query response times.

[0085] In view of the problems associated with joining and aggregationwithin RDBMS, prior art ROLAP systems have suffered from essentially thesame shortcomings and drawbacks of their underlying RDBMS.

[0086] While prior art MOLAP systems provide for improved access time toaggregated data within their underlying MDD structures, and haveperformance advantages when carrying out joining and aggregationsoperations, prior art MOLAP architectures have suffered from a number ofshortcomings and drawbacks. More specifically, atomic (raw) data ismoved, in a single transfer, to the MOLAP system for aggregation,analysis and querying. Importantly, the aggregation results are externalto the DBMS. Thus, users of the DBMS cannot directly view these results.Such results are accessible only from the MOLAP system. Because the MDDquery processing logic in prior art MOLAP systems is separate from thatof the DBMS, users must procure rights to access to the MOLAP system andbe instructed (and be careful to conform to such instructions) to accessthe MDD (or the DBMS) under certain conditions. Such requirements canpresent security issues, highly undesirable for system administration.Satisfying such requirements is a costly and logistically cumbersomeprocess. As a result, the widespread applicability of MOLAP systems hasbeen limited.

[0087] Thus, there is a great need in the art for an improved mechanismfor joining and aggregating data elements within a database managementsystem (e.g., RDBMS), and for integrating the improved databasemanagement system (e.g., RDBMS) into informational database systems(including the data warehouse and OLAP domains), while avoiding theshortcomings and drawbacks of prior art systems and methodologies.

SUMMARY AND OBJECTS OF PRESENT INVENTION

[0088] Accordingly, it is a further object of the present invention toprovide an improved method of and system for managing data elementswithin a multidimensional database (MDDB) using a novel stand-alone(i.e. external) data aggregation server, achieving a significantincrease in system performance (e.g. deceased access/search time) usinga stand-alone scalable data aggregation server.

[0089] Another object of the present invention is to provide suchsystem, wherein the stand-alone aggregation server includes anaggregation engine that is integrated with an MDDB, to provide acartridge-style plug-in accelerator which can communicate with virtuallyany conventional OLAP server.

[0090] Another object of the present invention is to provide such astand-alone data aggregration server whose computational tasks arerestricted to data aggregation, leaving all other OLAP functions to theMOLAP server and therefore complementing OLAP server's functionality.

[0091] Another object of the present invention is to provide such asystem, wherein the stand-alone aggregation server carries out animproved method of data aggregation within the MDDB which enables thedimensions of the MDDB to be scaled up to large numbers and large atomic(i.e. base) data sets to be handled within the MDDB.

[0092] Another object of the present invention is to provide such astand-alone aggregration server, wherein the aggregation engine supportshigh-performance aggregation (i.e. data roll-up) processes to maximizequery performance of large data volumes, and to reduce the time ofpartial aggregations that degrades the query response.

[0093] Another object of the present invention is to provide such astand-alone, external scalable aggregation server, wherein itsintegrated data aggregation (i.e. roll-up) engine speeds up theaggregation process by orders of magnitude, enabling larger databaseanalysis by lowering the aggregation times.

[0094] Another object of the present invention is to provide such anovel stand-alone scalable aggregation server for use in OLAPoperations, wherein the scalability of the aggregation server enables(i) the speed of the aggregation process carried out therewithin to besubstantially increased by distributing the computationally intensivetasks associated with data aggregation among multiple processors, and(ii) the large data sets contained within the MDDB of the aggregationserver to be subdivided among multiple processors thus allowing the sizeof atomic (i.e. basic) data sets within the MDDB to be substantiallyincreased.

[0095] Another object of the present invention is to provide such anovel stand-alone scalable aggregation server, which provides foruniform load balancing among processors for high efficiency and bestperformance, and linear scalability for extending the limits by addingprocessors.

[0096] Another object of the present invention is to provide astand-alone, external scalable aggregation server, which is suitable forMOLAP as well as for ROLAP system architectures.

[0097] Another object of the present invention is to provide a novelstand-alone scalable aggregation server, wherein an MDDB and aggregationengine are integrated and the aggregation engine carries out ahigh-performance aggregation algorithm and novel storing and searchingmethods within the MDDB.

[0098] Another object of the present invention is to provide a novelstand-alone scalable aggregation server which can be supported onsingle-processor (i.e. sequential or serial) computing platforms, aswell as on multi-processor (i.e. parallel) computing platforms.

[0099] Another object of the present invention is to provide a novelstand-alone scalable aggregation server which can be used as acomplementary aggregation plug-in to existing MOLAP and ROLAP databases.

[0100] Another object of the present invention is to provide a novelstand-alone scalable aggregation server which carries out an novelrollup (i.e. down-up) and spread down (i.e. top-down) aggregationalgorithms.

[0101] Another object of the present invention is to provide a novelstand-alone scalable aggregation server which includes an integratedMDDB and aggregation engine which carries out full pre-aggregationand/or “on-the-fly” aggregation processes within the MDDB.

[0102] Another object of the present invention is to provide such anovel stand-alone scalable aggregation server which is capable ofsupporting MDDB having a multi-hierarchy dimensionality.

[0103] Another object of the present invention is to provide a novelmethod of aggregating multidimensional data of atomic data setsoriginating from a RDBMS Data Warehouse.

[0104] Another object of the present invention is to provide a novelmethod of aggregating multidimensional data of atomic data setsoriginating from other sources, such as external ASCII files, MOLAPserver, or other end user applications.

[0105] Another object of the present invention is to provide a novelstand-alone scalable data aggregation server which can communicate withany MOLAP server via standard ODBC, OLE DB or DLL interface, in acompletely transparent manner with respect to the (client) user, withoutany time delays in queries, equivalent to storage in MOLAP server'scache.

[0106] Another object of the present invention is to provide a novel“cartridge-style” (stand-alone) scalable data aggregation engine whichdramatically expands the boundaries of MOLAP into large-scaleapplications including Banking, Insurance, Retail and PromotionAnalysis.

[0107] Another object of the present invention is to provide a novel“cartridge-style” (stand-alone) scalable data aggregation engine whichdramatically expands the boundaries of high-volatility type ROLAPapplications such as, for example, the precalculation of data tomaximize query performance.

[0108] Another object of the present invention is to provide a genericplug-in cartridge-type data aggregation component, suitable for allMOLAP systems of different vendors, dramatically reducing theiraggregation burdens.

[0109] Another object of the present invention is to provide a novelhigh performance cartridge-type data aggregration server which, havingstandardized interfaces, can be plugged-into the OLAP system ofvirtually any user or vendor.

[0110] Another object of the present invention is to provide a novel“cartridge-style” (stand-alone) scalable data aggregation engine whichhas the capacity to convert long batch-type data aggregations intointeractive sessions.

[0111] In another aspect, it is an object of the present invention toprovide an improved method of and system for joining and aggregatingdata elements integrated within a database management system (DBMS)using a non-relational multi-dimensional data structure (MDDB),achieving a significant increase in system performance (e.g. deceasedaccess/search time), user flexibility and ease of use.

[0112] Another object of the present invention is to provide such anDBMS wherein its integrated data aggregation module supportshigh-performance aggregation (i.e. data roll-up) processes to maximizequery performance of large data volumes.

[0113] Another object of the present invention is to provide such anDBMS system, wherein its integrated data aggregation (i.e. roll-up)module speeds up the aggregation process by orders of magnitude,enabling larger database analysis by lowering the aggregation times.

[0114] Another object of the present invention is to provide such anovel DBMS system for use in OLAP operations.

[0115] Another object of the present invention is to provide a novelDBMS system having an integrated aggregation module that carries out annovel rollup (i.e. down-up) and spread down (i.e. top-down) aggregationalgorithms.

[0116] Another object of the present invention is to provide a novelDBMS system having an integrated aggregation module that carries outfull pre-aggregation and/or “on-the-fly” aggregation processes.

[0117] Another object of the present invention is to provide a novelDBMS system having an integrated aggregation module which is capable ofsupporting a MDDB having a multi-hierarchy dimensionality.

[0118] These and other object of the present invention will becomeapparent hereinafter and in the claims to Invention set forth herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0119] In order to more fully appreciate the objects of the presentinvention, the following Detailed Description of the IllustrativeEmbodiments should be read in conjunction with the accompanyingDrawings, wherein:

[0120]FIG. 1A is a schematic representation of an exemplary prior artrelational on-line analytical processing (ROLAP) system comprising athree-tier or layer client/server architecture, wherein the first tierhas a database layer utilizing an RDBMS for data storage, access, andretrieval processes, the second tier has an application logic layer(i.e. the ROLAP engine) for executing the multidimensional reports frommultiple users, and the third tier integrates the ROLAP engine with avariety of presentation layers, through which users perform OLAPanalyses;

[0121]FIG. 1B is a schematic representation of a generalized embodimentof a prior art multidimensional on-line analytical processing (MOLAP)system comprising a base data loader for receiving atomic (i.e. base)data from a Data Warehouse realized by a RDBMS, an OLAP multidimensionaldatabase (MDDB), an aggregation, access and retrival module, applicationlogic module and presentation module associated with a conventional OLAPsever (e.g. Oracle's Express Server) for supporting on-linetransactional processing (OLTP) operations on the MDDB, to servicedatabase queries and requests from a plurality of OLAP client machinestypically accessing the system from an information network (e.g. theInternet);

[0122]FIG. 2A is a schematic representation of the Data Warehouse shownin the prior art system of FIG. 1B comprising numerous data tables (e.g.T1, T2 . . . Tn) and data field links, and the OLAP multidimensionaldatabase shown of FIG. 1B, comprising a conventional page allocationtable (PAT) with pointers pointing to the physical storage of variablesin an information storage device;

[0123]FIG. 2B is a schematic representation of an exemplarythree-dimensional MDDB and organized as a 3-dimensional Cartesian cubeand used in the prior art system of FIG. 2A, wherein the first dimensionof the MDDB is representative of geography (e.g. cities, states,countries, continents), the second dimension of the MDDB isrepresentative of time (e.g. days, weeks, months, years), the thirddimension of the MDDB is representative of products (e.g. all products,by manufacturer), and the basic data element is a set of variables whichare addressed by 3-dimensional coordinate values;

[0124]FIG. 2C is a schematic representation of a prior art arraystructure associated with an exemplary three-dimensional MDDB, arrangedaccording to a dimensional hierarchy;

[0125]FIG. 2D is a schematic representation of a prior art pageallocation table for an exemplary three-dimensional MDDB, arrangedaccording to pages of data element addresses;

[0126]FIG. 3A is a schematic representation of a prior art MOLAP system,illustrating the process of periodically storing raw data in the RDBMSData Warehouse thereof, serially loading of basic data from the DataWarehouse to the MDDB, and the process of serially pre-aggregating (orpre-compiling) the data in the MDDB along the entire dimensionalhierarchy thereof;

[0127]FIG. 3B is a schematic representation illustrating that theCartesian addresses listed in a prior art page allocation table (PAT)point to where physical storage of data elements (i.e. variables) occursin the information recording media (e.g. storage volumes) associatedwith the MDDB, during the loading of basic data into the MDDB as well asduring data preaggregation processes carried out therewithin;

[0128]FIG. 3C1 is a schematic representation of an exemplarythree-dimensional database used in a conventional MOLAP system of theprior art, showing that each data element contained therein isphysically stored at a location in the recording media of the systemwhich is specified by the dimensions (and subdimensions within thedimensional hierarchy) of the data variables which are assignedinteger-based coordinates in the MDDB, and also that data elementsassociated with the basic data loaded into the MDDB are assigned lowerinteger coordinates in MDDB Space than pre-aggregated data elementscontained therewithin;

[0129]FIG. 3C2 is a schematic representation illustrating that aconventional hierarchy of the dimension of “time” typically contains thesubdimensions “days, weeks, months, quarters, etc.” of the prior art;

[0130]FIG. 3C3 is a schematic representation showing how data elementshaving higher subdimensions of time in the MDDB of the prior art aretypically assigned increased integer addresses along the time dimensionthereof;

[0131]FIG. 4 is a schematic representation illustrating that, for verylarge prior art MDDBs, very large page allocation tables (PATs) arerequired to represent the address locations of the data elementscontained therein, and thus there is a need to employ address datapaging techniques between the DRAM (e.g. program memory) and massstorage devices (e.g. recording discs or RAIDs) available on the serialcomputing platform used to implement such prior art MOLAP systems;

[0132]FIG. 5 is a graphical representation showing how search time in aconventional (i.e. prior art) MDDB increases in proportion to the amountof preaggregation of data therewithin;

[0133]FIG. 6A is a schematic representation of a generalized embodimentof a multidimensional on-line analytical processing (MOLAP) system ofthe present invention comprising a Data Warehouse realized as arelational database, a stand-alone Aggregration Server of the presentinvention having an integrated aggregation engine and MDDB, and an OLAPserver supporting a plurality of OLAP clients, wherein the stand-aloneAggregation Server performs aggregation functions (e.g. summation ofnumbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc.) and multi-dimensional datastorage functions;

[0134]FIG. 6B is a schematic block diagram of the stand-aloneAggregation Server of the illustrative embodiment shown in FIG. 6A,showing its primary components, namely, a base data interface (e.g.OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) for receiving RDBMS flat fileslists and other files from the Data Warehouse (RDBMS), a base dataloader for receiving base data from the base data interface,configuration manager for managing the operation of the base datainterface and base data loader, an aggregation engine and MDDB handlerfor receiving base data from the base loader, performing aggregationoperations on the base data, and storing the base data and aggregateddata in the MDDB; an aggregation client interface (e.g. OLDB, OLEDB,ODBC, SQL, JDBC, API, etc.) and input analyzer for receiving requestsfrom OLAP client machines, cooperating with the aggregation engine andMDDB handler to generate aggregated data and/or retrieve aggregated datafrom the MDDB that pertains to the received requests, and returning thisaggregated back to the requesting OLAP clients; and a configurationmanager for managing the operation of the input analyzer and theaggregation client interface.

[0135]FIG. 6C is a schematic representation of the software modulescomprising the aggregation engine and MDDB handler of the stand-aloneAggregation Server of the illustrative embodiment of the presentinvention, showing a base data list structure being supplied to ahierarchy analysis and reorder module, the output thereof beingtransferred to an aggregation management module, the output thereofbeing transferred to a storage module via a storage management module,and a Query Directed Roll-up (QDR) aggregation management module beingprovided for receiving database (DB) requests from OLAP client machines(via the aggregation client interface) and managing the operation of theaggregation and storage management modules of the present invention;

[0136]FIG. 6D is a flow chart representation of the primary operationscarried out by the (DB) request serving mechanism within the QDRaggregation management module shown in FIG. 6C;

[0137]FIG. 7A is a schematic representation of a separate-platform typeimplementation of the stand-alone Aggregation Server of the illustrativeembodiment of FIG. 6B and a conventional OLAP server supporting aplurality of client machines, wherein base data from a Data Warehouse isshown being received by the aggregation server, realized on a firsthardware/software platform (i.e. Platform A) and the stand-aloneAggregation Server is shown serving the conventional OLAP server,realized on a second hardware/software platform (i.e. Platform B), aswell as serving data aggregation requirements of other clientssupporting diverse applications such as spreadsheet, GUI front end, andapplications;

[0138]FIG. 7B is a schematic representation of a shared-platform typeimplementation of the stand-alone Aggregation Server of the illustrativeembodiment of FIG. 6B and a conventional OLAP server supporting aplurality of client machines, wherein base data from a Data Warehouse isshown being received by the stand-alone Aggregation Server, realized ona common hardware/software platform and the aggregation server is shownserving the conventional OLAP server, realized on the same commonhardware/software platform, as well as serving data aggregationrequirements of other clients supporting diverse applications such asspreadsheet, GUI front end, and applications;

[0139]FIG. 8A is a data table setting forth information representativeof performance benchmarks obtained by the shared-platform typeimplementation of the stand-alone Aggregation Server of the illustrativeembodiment serving the conventional OLAP server (i.e. Oracle EXPRESSServer) shown in FIG. 7B, wherein the common hardware/software platformis realized using a Pentium II 450 Mhz, 1 GB RAM, 18 GB Disk, runningthe Microsoft NT operating system (OS);

[0140]FIG. 9A is a schematic representation of the first stage in themethod of segmented aggregation according to the principles of thepresent invention, showing initial aggregration along the 1st dimension;

[0141]FIG. 9B is a schematic representation of the next stage in themethod of segmented aggregation according to the principles of thepresent invention, showing that any segment along dimension 1, such asthe shown slice, can be separately aggregated along the remainingdimensions, 2 and 3, and that in general, for an N dimensional system,the second stage involves aggregation in N-1 dimensions. The principleof segementation can be applied on the first stage as well, however,only a large enough data will justify such a sliced procedure in thefirst dimension. Actually, it is possible to consider each segment as anN-1 cube, enabling recursive computation.

[0142]FIG. 9C1 is a schematic representation of the Query DirectedRoll-up (QDR) aggregation method/procedure of the present invention,showing data aggregation starting from existing basic data or previouslyaggregated data in the first dimension (D1), and such aggregated databeing utilized as a basis for QDR aggregation along the second dimension(D2);

[0143]FIG. 9C2 is a schematic representation of the Query DirectedRoll-up (QDR) aggregation method/procedure of the present invention,showing initial data aggregation starting from existing previouslyaggregated data in the second third (D3), and continuing along the thirddimension (D3), and thereafter continuing aggregation along the seconddimension (D2);

[0144]FIG. 10A is a schematic representation of the “slice-storage”method of storing sparse data in the disk storage devices of the MDDB ofFIG. 6B in accordance with the principles of the present invention,based on an ascending-ordered index along aggregation direction,enabling fast retrieval of data;

[0145]FIG. 10B is a schematic representation of the data organization ofdata files and the directory file used in the storages of the MDDB ofFIG. 6B, and the method of searching for a queried data point thereinusing a simple binary search technique due to the data files ascendingorder;

[0146]FIG. 11A is a schematic representation of three exemplarymulti-hierarchical data structures for storage of data within the MDDBof FIG. 6B, having three levels of hierarchy, wherein the first levelrepresentative of base data is composed of items A, B, F, and G, thesecond level is composed of items C, E, H and I, and the third level iscomposed of a single item D, which is common to all three hierarchicalstructures;

[0147]FIG. 11B is a schematic representation of an optimizedmulti-hierarchical data structure merged from all three hierarchies ofFIG. 11A, in accordance with the principles of the present invention;

[0148] FIGS. 11C(i) through 11C(ix) represent a flow chart description(and accompanying data structures) of the operations of an exemplaryhierarchy transformation mechanism of the present invention thatoptimally merges multiple hierarchies into a single hierarchy that isfunctionally equivalent to the multiple hierarchies.

[0149]FIG. 12 is a schematic representation showing the levels ofoperations performed by the stand-alone Aggregation Server of FIG. 6B,summarizing the different enabling components for carrying out themethod of segmented aggregation in accordance with the principles of thepresent invention;

[0150]FIG. 13 is a schematic representation of the stand-aloneAggregation Server of the present invention shown as a component of acentral data warehouse, serving the data aggregation needs of URLdirectory systems, Data Marts, RDBMSs, ROLAP systems and OLAP systemsalike;

[0151]FIG. 14 is a schematic representation of a prior art informationdatabase system, wherein the present invention may be embodied;

[0152]FIG. 15 is a schematic representation of the prior art datawarehouse and OLAP system, wherein the present invention may beembodied;

[0153] FIGS. 16A-16C are schematic representations of exemplary tablesemployed in a prior art Relational Database Management System (RDBMS);FIGS. 16B and 16C illustrate operators (queries) on the table of FIG.16A, and the result of such queries, respectively;

[0154]FIG. 17A is a schematic representation of an exemplary dimensionalschema (star schema) of a relational database;

[0155]FIG. 17B is a schematic representation of tables used to implementthe schema shown in FIG. 17A;

[0156]FIG. 18A is a schematic representation of an exemplarymultidimensional schema (star schema);

[0157]FIG. 18B is a schematic representation of tables used to implementthe schema of FIG. 18A, including summary tables storing results ofaggregation operations performed on the facts of the central fact tablealong the time-period dimension, in accordance with conventionalteachings;

[0158]FIG. 19A is a schematic representation of an exemplary embodimentof a DBMS (for example, an RDBMS as shown) of the present inventioncomprising a relational datastore and an integrated multidimensional(MDD) aggregation module supporting queries from a plurality of clients,wherein the aggregation engine performs aggregation functions (e.g.summation of numbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc.) and non-relationalmulti-dimensional data storage functions.

[0159]FIG. 19B is a schematic block diagram of the MDD aggregationmodule of the illustrative embodiment of the present invention shown inFIG. 6A.

[0160] FIGS. 19C(i) and 19C(ii), taken together, set forth a flow chartrepresentation of the primary operations carried out within the DBMS ofthe present invention when performing data aggregation and relatedsupport operations, including the servicing of user-submitted (e.g.natural language) queries made on such aggregated database of thepresent invention.

[0161]FIG. 19D is a flow chart representation of the primary operationscarried out by the (DB) request serving mechanism within the MDD controlmodule shown in FIG. 6B.

[0162]FIG. 19E is a schematic representation of the view mechanism of anDBMS that enables users to query on the aggregated data generated and/orstored in the MDD Aggregation module according to the present invention.

[0163]FIG. 19F is a schematic representation of the trigger mechanism ofthe DBMS that enables users to query on the aggregated data generatedand/or stored in the MDD Aggregation module according to the presentinvention.

[0164]FIG. 19G is a schematic representation of the DBMS of the presentinvention, illustrating a logically partitioning into a relational partand a non-relational part. The relational part includes the relationaldata store (e.g., table(s) and dictionary) and support mechanisms (e.g.,query handling services). The non-relational part includes the MDDAggregation Module. Data flows bidirectionally between the relationalpart and the non-relational part as shown.

[0165]FIG. 20A shows a separate-platform type implementation of the DBMSsystem of the illustrative embodiment shown in FIG. 19A, wherein therelational datastore and support mechanisms (e.g., query handling, facttable(s) and dictionary of the DBMS) reside on a separate hardwareplatform and/or OS system from that used to run the MDD AggregationModule of the present invention.

[0166]FIG. 20B shows a common-platform type implementation of the DBMSsystem of the illustrative embodiment shown in FIG. 19A, wherein therelational datastore and support mechanisms (e.g., query handling, facttable(s) and dictionary of the DBMS) share the same hardware platformand operating system (OS) that is used to run the MDD Aggregation Moduleof the present invention.

[0167]FIG. 21 is a schematic representation of the DBMS of the presentinvention shown as a component of a central data warehouse, serving thedata storage and aggregation needs of a ROLAP system (or other OLAPsystem).

[0168]FIG. 22 is a schematic representation of the DBMS of the presentinvention shown as a component of a central data warehouse, wherein theDBMS includes integrated OLAP Analysis Logic (and preferably anintegrated Presentation Module) that operates cooperatively with thequery handling of the DBMS system and the MDD Aggregation Module toenable users of the DBMS system to execute multidimensional reports(e.g., ratios, ranks, transforms, dynamic consolidation, complexfiltering, forecasts, query governing, scheduling, flow control,preaggregate inferencing, denormalization support, and/or tablepartitioning and joins) and preferably perform traditional OLAP analyses(grids, graphs, maps, alerts, drill-down, data pivot, data surf, sliceand dice, print).

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OF THE PRESENTINVENTION

[0169] Referring now to FIGS. 6A through 13, the preferred embodimentsof the method and system of the present invention will be now describedin great detail hereinbelow, wherein like elements in the Drawings shallbe indicated by like reference numerals.

[0170] Through this invention disclosure, the term “aggregation” and“preaggregation” shall be understood to mean the process of summation ofnumbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc.

[0171] In general, the stand-alone aggregation server and methods of andapparatus for data aggregation of the present invention can be employedin a wide range of applications, including MOLAP systems, ROLAP systems,Internet URL-directory systems, personalized on-line e-commerce shoppingsystems, Internet-based systems requiring real-time control of packetrouting and/or switching, and the like.

[0172] For purposes of illustration, initial focus will be accorded toimprovements in MOLAP systems, in which knowledge workers are enabled tointuitively, quickly, and flexibly manipulate operational data within aMDDB using familiar business terms in order to provide analyticalinsight into a business domain of interest.

[0173]FIG. 6A illustrates a generalized embodiment of a multidimensionalon-line analytical processing (MOLAP) system of the present inventioncomprising: a Data Warehouse realized as a relational database; astand-alone cartridge-style Aggregation Server of the present inventionhaving an integrated aggregation engine and a MDDB; and an OLAP servercommunicating with the Aggregation Server, and supporting a plurality ofOLAP clients. In accordance with the principles of the presentinvention, the stand-alone Aggregation Server performs aggregationfunctions (e.g. summation of numbers, as well as other mathematicaloperations, such as multiplication, subtraction, division etc.) andmulti-dimensional data storage functions.

[0174] Departing from conventional practices, the principles of thepresent invention teaches moving the aggregation engine and the MDDBinto a separate Aggregation Server having standardized interfaces sothat it can be plugged-into the OLAP server of virtually any user orvendor. This dramatic move discontinues the restricting dependency ofaggregation from the analytical functions of OLAP, and by applying noveland independent algorithms. The stand-alone data aggregation serverenables efficient organization and handling of data, fast aggregationprocessing, and fast access to and retrieval of any data element in theMDDB.

[0175] As will be described in greater detail hereinafter, theAggregation Server of the present invention can serve the dataaggregation requirements of other types of systems besides OLAP systemssuch as, for example, URL directory management Data Marts, RDBMS, orROLAP systems.

[0176] The Aggregation Server of the present invention excels inperforming two distinct functions, namely: the aggregation of data inthe MDDB; and the handling of the resulting data base in the MDDB, for“on demand” client use. In the case of serving an OLAP server, theAggregation Server of the present invention focuses on performing thesetwo functions in a high performance manner (i.e. aggregating and storingbase data, originated at the Data Warehouse, in a multidimensionalstorage (MDDB), and providing the results of this data aggregationprocess “on demand” to the clients, such as the OLAP server, spreadsheetapplications, the end user applications. As such, the Aggregation Serverof the present invention frees each conventional OLAP server, with whichit interfaces, from the need of making data aggregations, and thereforeallows the conventional OLAP server to concentrate on the primaryfunctions of OLAP servers, namely: data analysis and supporting agraphical interface with the user client.

[0177]FIG. 6B shows the primary components of the stand-aloneAggregation Server of the illustrative embodiment, namely: a base datainterface (e.g. OLDB, OLE-DB, ODBC, SQL, JDBC, API, etc.) for receivingRDBMS flat files lists and other files from the Data Warehouse (RDBMS),a base data loader for receiving base data from the base data interface,configuration manager for managing the operation of the base datainterface and base data loader, an aggregation engine for receiving basedata from the base loader, a multi-dimensional database (MDDB); a MDDBhandler, an input analyzer, an aggregation client interface (e.g. OLDB,OLEDB, ODBC, SQL, API, JDBC, etc.) and a configuration manager formanaging the operation of the input analyzer and the aggregation clientinterface.

[0178] During operation, the base data originates at data warehouse orother sources, such as external ASCII files, MOLAP server, or others.The Configuration Manager, in order to enable proper communication withall possible sources and data structures, configures two blocks, theBase Data Interface and Data Loader. Their configuration is matched withdifferent standards such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc. Asshown in FIG. 6B, the core of the data Aggregation Server of the presentinvention comprises: a data Aggregation Engine; a Multidimensional DataHandler (MDDB Handler); and a Multidimensional Data Storage (MDDB). Theresults of data aggregation are efficiently stored in the MDDB by theMDDB Handler.

[0179] As shown in FIGS. 6A and 6B, the stand-alone Aggregation Serverof the present invention serves the OLAP Server (or other requestingcomputing system) via an aggregation client interface, which preferablyconforms to standard interface protocols such as OLDB, OLEDB, ODBC, SQL,JDBC, an API, etc. Aggregation results required by the OLAP server aresupplied on demand. Typically, the OLAP Server disintegrates the query,via parsing process, into series of requests. Each such request,specifying a n-dimensional coordinate, is presented to the AggregationServer. The Configuration Manager sets the Aggregation Client Interfaceand Input Analyzer for a proper communication protocol according to theclient user. The Input Analyzer converts the input format to make itsuitable for the MDDB Handler.

[0180] An object of the present invention is to make the transfer ofdata completely transparent to the OLAP user, in a manner which isequivalent to the storing of data in the MOLAP server's cache andwithout any query delays. This requires that the stand-alone AggregationServer have exceptionally fast response characteristics. This object isenabled by providing the unique data structure and aggregation mechanismof the present invention.

[0181]FIG. 6C shows the software modules comprising the aggregationengine and MDDB handler components of the stand-alone Aggregation Serverof the illustrative embodiment. The base data list, as it arrives fromRDBMS or text files, has to be analyzed and reordered to optimizehierarchy handling, according to the unique method of the presentinvention, as described later with reference to FIGS. 11A and 11B.

[0182] The function of the aggregation management module is toadministrate the aggregation process according to the method illustratedin FIGS. 9A and 9B.

[0183] In accordance with the principles of the present invention, dataaggregation within the stand-alone Aggregation Server can be carried outeither as a complete pre-aggregation process, where the base data isfully aggregated before commencing querying, or as a query directedrollup (QDR) process, where querying is allowed at any stage ofaggregation using the “on-the-fly” data aggregation process of thepresent invention. The QDR process will be described hereinafter ingreater detail with reference to FIG. 9C. The response to a request(i.e. a basic component of a client query), by calling the Aggregationmanagement module for “on-the-fly” data aggregation, or for accessingpre-aggregated result data via the Storage management module. Thequery/request serving mechanism of the present invention within the QDRaggregation management module is illustrated in the flow chart of FIG.6D.

[0184] The function of the Storage management module is to handlemultidimensional data in the storage(s) module in a very efficient way,according to the novel method of the present invention, which will bedescribed in detail hereinafter with reference to FIGS. 10A and 10B.

[0185] The request serving mechanism shown in FIG. 6D is controlled bythe QDR aggregation management module. Requests are queued and servedone by one. If the required data is already pre-calculated, then it isretrieved by the storage management module and returned to the client.Otherwise, the required data is calculated “on-the-fly” by theaggregation management module, and the result moved out to the client,while simultaneously stored by the storage management module, shown inFIG. 6C.

[0186]FIGS. 7A and 7B outline two different implementations of thestand-alone (cartridge-style) Aggregation Server of the presentinvention. In both implementations, the Aggregation Server suppliesaggregated results to a client.

[0187]FIG. 7A shows a separate-platform type implementation of the MOLAPsystem of the illustrative embodiment shown in FIG. 6A, wherein theAggregation Server of the present invention resides on a separatehardware platform and OS system from that used to run the OLAP server.In this type of implementation, it is even possible to run theAggregation Server and the OLAP Server on different-type operatingsystems (e.g. NT, Unix, MAC OS).

[0188]FIG. 7B shows a common-platform type implementation of the MOLAPsystem of the illustrative embodiment shown in FIG. 6B, wherein theAggregation Server of the present invention and OLAP Server share thesame hardware platform and operating system (OS).

[0189]FIG. 8A shows a table setting forth the benchmark results of anaggregation engine, implemented on a shared/common hardware platform andOS, in accordance with the principles of the present invention. Thecommon platform and OS is realized using a Pentium II 450 Mhz, 1 GB RAM,18 GB Disk, running the Microsoft NT operating system. The six (6) datasets shown in the table differ in number of dimensions, number ofhierarchies, measure of sparcity and data size. A comparison with ORACLEExpress, a major OLAP server, is made. It is evident that theaggregation engine of the present invention outperforms currentlyleading aggregation technology by more than an order of magnitude.

[0190] The segmented data aggregation method of the present invention isdescribed in FIGS. 9A through 9C2. These figures outline a simplifiedsetting of three dimensions only; however, the following analysisapplies to any number of dimensions as well.

[0191] The data is being divided into autonomic segments to minimize theamount of simultaneously handled data. The initial aggregation ispracticed on a single dimension only, while later on the aggregationprocess involves all other dimensions.

[0192] At the first stage of the aggregation method, an aggregation isperformed along dimension 1. The first stage can be performed on morethan one dimension. As shown in FIG. 9A, the space of the base data isexpanded by the aggregation process.

[0193] In the next stage shown in FIG. 9B, any segment along dimension1, such as the shown slice, can be separately aggregated along theremaining dimensions, 2 and 3. In general, for an N dimensional system,the second stage involves aggregation in N-1 dimensions.

[0194] The principle of data segmentation can be applied on the firststage as well. However, only a large enough data set will justify such asliced procedure in the first dimension. Actually, it is possible toconsider each segment as an N-1 cube, enabling recursive computation.

[0195] It is imperative to get aggregation results of a specific slicebefore the entire aggregation is completed, or alternatively, to havethe roll-up done in a particular sequence. This novel feature of theaggregation method of the present invention is that it allows thequerying to begin, even before the regular aggregation process isaccomplished, and still having fast response. Moreover, in relationalOLAP and other systems requiring only partial aggregations, the QDRprocess dramatically speeds up the query response.

[0196] The QDR process is made feasible by the slice-oriented roll-upmethod of the present invention. After aggregating the firstdimension(s), the multidimensional space is composed of independentmultidimensional cubes (slices). These cubes can be processed in anyarbitrary sequence.

[0197] Consequently the aggregation process of the present invention canbe monitored by means of files, shared memory sockets, or queues tostatically or dynamically set the roll-up order.

[0198] In order to satisfy a single query coming from a client, beforethe required aggregation result has been prepared, the QDR process ofthe present invention involves performing a fast on-the-fly aggregation(roll-up) involving only a thin slice of the multidimensional data.

[0199]FIG. 9C1 shows a slice required for building-up a roll-up resultof the 2^(nd) dimension. In case 1, as shown, the aggregation startsfrom an existing data, either basic or previously aggregated in thefirst dimension. This data is utilized as a basis for QDR aggregationalong the second dimension. In case 2, due to lack of previous data, aQDR involves an initial slice aggregation along dimension 3, andthereafter aggregation along the 2^(nd) dimension.

[0200]FIG. 9C2 shows two corresponding QDR cases for gaining results inthe 3d dimension. Cases 1 and 2 differ in the amount of initialaggregation required in 2^(nd) dimension.

[0201]FIG. 10A illustrates the “Slice-Storage” method of storing sparsedata on storage disks. In general, this data storage method is based onthe principle that an ascending-ordered index along aggregationdirection, enables fast retrieval of data. FIG. 10A illustrates aunit-wide slice of the multidimensional cube of data. Since the data issparse, only few non-NA data points exist. These points are indexed asfollows. The Data File consists of data records, in which each n-1dimensional slice is being stored, in a separate record. These recordshave a varying length, according to the amount of non-NA stored points.For each registered point in the record, IND_(k) stands for an index ina n-dimensional cube, and Data stands for the value of a given point inthe cube.

[0202]FIG. 10B illustrates a novel method for randomly searching for aqueried data point in the MDDB of FIG. 6B by using a novel technique oforganizing data files and the directory file used in the storages of theMDDB, so that a simple binary search technique can then be employedwithin the Aggregation Server of the prsent invention. According to thismethod, a metafile termed DIR File, keeps pointers to Data Files as wellas additional parameters such as the start and end addresses of datarecord (IND₀, IND_(n)), its location within the Data File, record size(n), file's physical address on disk (D_Path), and auxiliary informationon the record (Flags).

[0203] A search for a queried data point is then performed by an accessto the DIR file. The search along the file can be made using a simplebinary search due to file's ascending order. When the record is found,it is then loaded into main memory to search for the required point,characterized by its index IND_(k). The attached Data field representsthe queried value. In case the exact index is not found, it means thatthe point is a NA.

[0204] In another aspect of the present invention, a novel method isprovided for optimally merging multiple hierarchies inmulti-hierarchical structures. The method, illustrated in FIGS. 11A,11B, and 11C is preferably used by the Aggregation Server of the presentinvention in processing the table data (base data), as it arrives fromRDBMS.

[0205] According to the devised method, the inner order of hierarchieswithin a dimension is optimized, to achieve efficient data handling forsummations and other mathematical formulas (termed in general“Aggregation”). The order of hierarchy is defined externally. It isbrought from a data source to the stand-alone aggregation engine, as adescriptor of data, before the data itself. In the illustrativeembodiment, the method assumes hierarchical relations of the data, asshown in FIG. 11A. The way data items are ordered in the memory space ofthe Aggregation Server, with regard to the hierarchy, has a significantimpact on its data handling efficiency.

[0206] Notably, when using prior art techniques, multiple handling ofdata elements, which occurs when a data element is accessed more thanonce during aggregation process, has been hitherto unavoidable when themain concern is to effectively handle the sparse data. The datastructures used in prior art data handling methods have been designedfor fast access to a non NA data. According to prior art techniques,each access is associated with a timely search and retrieval in the datastructure. For the massive amount of data typically accessed from a DataWarehouse in an OLAP application, such multiple handling of dataelements has significantly degraded the efficiency of prior art dataaggregation processes. When using prior art data handling techniques,the data element D shown in FIG. 11A must be accessed three times,causing poor aggregation performance.

[0207] In accordance with the data handling method of the presentinvention, the data is being pre-ordered for a singular handling, asopposed to multiple handling taught by prior art methods. According tothe present invention, elements of base data and their aggregatedresults are contiguously stored in a way that each element will beaccessed only once. This particular order allows a forward-onlyhandling, never backward. Once a base data element is stored, oraggregated result is generated and stored, it is never to be retrievedagain for further aggregation. As a result the storage access isminimized. This way of singular handling greatly elevates theaggregation efficiency of large data bases. An efficient handling methodas used in the present invention, is shown in FIG. 7A. The data elementD, as any other element, is accessed and handled only once. FIG. 11Ashows an example of a multi-hierarchical database structure having 3hierarchies. As shown, the base data has a dimension that includes itemsA, B, F, and G. The second level is composed of items C, E, H and I. Thethird level has a single item D, which is common to all threehierarchical structures. In accordance with the method of the presentinvention, a minimal computing path is always taken. For example,according to the method of the present invention, item D will becalculated as part of structure 1, requiring two mathematical operationsonly, rather than as in structure 3, which would need four mathematicaloperations. FIG. 11B depicts an optimized structure merged from allthree hierarchies.

[0208] FIGS. 11C(i) through 11C(ix) represent a flow chart description(and accompanying data structures) of the operations of an exemplaryhierarchy transformation mechanism of the present invention thatoptimally merges multiple hierarchies into a single hierarchy that isfunctionally equivalent to the multiple hierarchies. For the sake ofdescription, the data structures correspond to exemplary hierarchicalstructures described above with respect to FIGS. 11(A) and 11(B). Asillustrated in FIG. 11C(i), in step 1101, a catalogue is loaded from theDBMS system. As is conventional, the catalogue includes data (“hierarchydescriptor data”) describing multiple hierarchies for at least onedimension of the data stored in the DBMS. In step 1103, this hierarchydescriptor data is extracted from the catalogue. A loop (steps1105-1119) is performed over the items in the multiple hierarchydescribed by the hierarchy descriptor data.

[0209] In the loop 1105-1119, a given item in the multiple hierarchy isselected (step 1107); and, in step 1109, the parent(s) (ifany)—including grandparents, great-grandparents, etc.—of the given itemare identified and added to an entry (for the given item) in a parentlist data structure, which is illustrated in FIG. 11C(v). Each entry inthe parent list corresponds to a specific item and includes zero or moreidentifiers for items that are parents (or grandparents, orgreat-grandparents) of the specific item. In addition, an inner loop(steps 1111-1117) is performed over the hierarchies of the multiplehierarchies described by the hierarchy descriptor data, wherein in step1113 one of the multiple hierarchies is selected. In step 1115, thechild of the given item in the selected hierarchy (if any) is identifiedand added (if need be) to a group of identifiers in an entry (for thegiven item) in a child list data structure, which is illustrated in FIG.11C(vi). Each entry in the child list corresponds to a specific item andincludes zero or more groups of identifiers each identifying a child ofthe specific item. Each group corresponds to one or more of thehierarchies described by the hierarchy descriptor data.

[0210] The operation then continues to steps 1121 and 1123 asillustrated in FIG. 11C(ii) to verify the integrity of the multiplehierarchies described by the hierarchy descriptor data (step 1121) andfix (or report to the user) any errors discovered therein (step 1123).Preferably, the integrity of the multiple hierarchies is verified instep 1121 by iteratively expanding each group of identifiers in thechild list to include the children, grandchildren, etc of any itemlisted in the group. If the child(ren) for each group for a specificitem do not match, a verification error is encountered, and such erroris fixed (or reported to the user (step 1123). The operation thenproceeds to a loop (steps 1125-1133) over the items in the child list.

[0211] In the loop (steps 1125-1133), a given item in the child list isidentified in step 1127. In step 1129, the entry in the child list forthe given item is examined to determine if the given item has nochildren (e.g., the corresponding entry is null). If so, the operationcontinues to step 1131 to add an entry for the item in level 0 of anordered list data structure, which is illustrated in FIG. 11C(vii);otherwise the operation continues to process the next item of the childlist in the loop. Each entry in a given level of the order listcorresponds to a specific item and includes zero or more identifierseach identifying a child of the specific item. The levels of the orderlist described the transformed hierarchy as will readily become apparentin light of the following. Essentially, loop 1125-1333 builds the lowestlevel (level 0) of the transformed hierarchy.

[0212] After loop 1125-1133, operation continues to process the lowestlevel to derive the next higher level, and iterate over this process tobuild out the entire transformed hierarchy. More specifically, in step1135, a “current level” variable is set to identify the lowest level. Instep 1137, the items of the “current level” of the ordered list arecopied to a work list. In step 1139, it is determined if the worklist isempty. If so, the operation ends; otherwise operation continues to step1141 wherein a loop (steps 1141-1159) is performed over the items in thework list.

[0213] In step 1143, a given item in the work list is identified andoperation continues to an inner loop (steps 1145-1155) over theparent(s) of the given item (which are specified in the parent listentry for the given item). In step 1147 of the inner loop, a givenparent of the given item is identified. In step 1149, it is determinedwhether any other parent (e.g., a parent other than the given patent) ofthe given item is a child of the given parent (as specified in the childlist entry for the given parent). If so, operation continues to step1155 to process the next parent of the given item in the inner loop;otherwise, operation continues to steps 1151 and 1153. In step 1151, anentry for the given parent is added to the next level (current level+1)of the ordered list, if it does not exist there already. In step 1153,if no children of the given item (as specified in the entry for thegiven item in the current level of the ordered list) matches (e.g., iscovered by) any child (or grandchild or great grandchild etc) of item(s)in the entry for the given parent in the next level of the ordered list,the given item is added to the entry for the given parent in the nextlevel of the ordered list. Levels 1 and 2 of the ordered list for theexample described above are shown in FIGS. 11C(viii) and 11C(ix),respectively. The children (including grandchildren and greatgrandchildren. etc) of an item in the entry for a given parent in thenext level of the ordered list may be identified by the informationencoded in the lower levels of the ordered list. After step 1153,operation continues to step 1155 to process the next parent of the givenitem in the inner loop (steps 1145-1155).

[0214] After processing the inner loop (steps 1145-1155), operationcontinues to step 1157 to delete the given item from the work list, andprocessing continues to step 1159 to process the next item of the worklist in the loop (steps 1141-1159).

[0215] After processing the loop (steps 1141-1159), the ordered list(e.g., transformed hierarchy) has been built for the next higher level.The operation continues to step 1161 to increment the current level tothe next higher level, and operation returns (in step 1163) to step 1138to build the next higher level, until the highest level is reached(determined in step 1139) and the operation ends.

[0216]FIG. 12 summarizes the components of an exemplary aggregationmodule that takes advantage of the hierarchy transformation techniquedescribed above. More specifically, the aggregation module includes anhierarchy transformation module that optimally merges multiplehierarchies into a single hierarchy that is functionally equivalent tothe multiple hierarchies. A second module loads and indexes the basedata supplied from the DBMS using the optimal hierarchy generated by thehierarchy transformation module. An aggregation engine performsaggregation operations on the base data. During the aggregationoperations along the dimension specified by the optimal hierarchy, theresults of the aggregation operations of the level 0 items may be usedin the aggregation operations of the level 1 items, the results of theaggregation operations of the level 1 items may be used in theaggregation operations of the level 2 items, etc. Based on theseoperations, the loading and indexing operations of the base data, alongwith the aggregation become very efficient, minimizing memory andstorage access, and speeding up storing and retrieval operations.

[0217]FIG. 13 shows the stand-alone Aggregation Server of the presentinvention as a component of a central data warehouse, serving the dataaggregation needs of URL directory systems, Data Marts, RDBMSs, ROLAPsystems and OLAP systems alike.

[0218] The reason for the central multidimensional database's rise tocorporate necessity is that it facilitates flexible, high-performanceaccess and analysis of large volumes of complex and interrelated data.

[0219] A stand-alone specialized aggregation server, simultaneouslyserving many different kinds of clients (e.g. data mart, OLAP, URL,RDBMS), has the power of delivering an enterprise-wide aggregation in acost-effective way. This kind of server eliminates the roll-upredundancy over the group of clients, delivering scalability andflexibility.

[0220] Performance associated with central data warehouse is animportant consideration in the overall approach. Performance includesaggregation times and query response.

[0221] Effective interactive query applications require near real-timeperformance, measured in seconds. These application performancestranslate directly into the aggregation requirements.

[0222] In the prior art, in case of MOLAP, a full pre-aggregation mustbe done before starting querying. In the present invention, in contrastto prior art, the query directed roll-up (QDR) allows instant querying,while the full pre-aggregation is done in the background. In cases afull pre-aggregation is preferred, the currently invented aggregationoutperforms any prior art. For the ROLAP and RDBMS clients, partialaggregations maximize query performance. In both cases fast aggregationprocess is imperative. The aggregation performance of the currentinvention is by orders of magnitude higher than that of the prior art.

[0223] The stand-alone scalable aggregation server of the presentinvention can be used in any MOLAP system environment for answeringquestions about corporate performance in a particular market, economictrends, consumer behaviors, weather conditions, population trends, orthe state of any physical, social, biological or other system orphenomenon on which different types or categories of information,organizable in accordance with a predetermined dimensional hierarchy,are collected and stored within a RDBMS of one sort or another.Regardless of the particular application selected, the address datamapping processes of the present invention will provide a quick andefficient way of managing a MDDB and also enabling decision supportcapabilities utilizing the same in diverse application environments.

[0224] The stand-alone “cartridge-style” plug-in features of the dataaggregation server of the present invention, provides freedom indesigning an optimized multidimensional data structure and handlingmethod for aggregation, provides freedom in designing a genericaggregation server matching all OLAP vendors, and enablesenterprise-wide centralized aggregation.

[0225] The method of Segmented Aggregation employed in the aggregationserver of the present invention provides flexibility, scalability, acondition for Query Directed Aggregation, and speed improvement.

[0226] The method of Multidimensional data organization and indexingemployed in the aggregation server of the present invention providesfast storage and retrieval, a condition for Segmented Aggregation,improves the storing, handling, and retrieval of data in a fast manner,and contributes to structural flexibility to allow sliced aggregationand QDR. It also enables the forwarding and single handling of data withimprovements in speed performance.

[0227] The method of Query Directed Aggregation (QDR) employed in theaggregation server of the present invention minimizes the data handlingoperations in multi-hierarchy data structures.

[0228] The method of Query Directed Aggregation (QDR) employed in theaggregation server of the present invention eliminates the need to waitfor full aggregation to be completed, and provides build-up aggregateddata required for full aggregation.

[0229] In another aspect of the present invention, an improved DBMSsystem (e.g., RDBMS system, object oriented database system orobject/relational database system) is provided that excels in performingtwo distinct functions, namely: the aggregation of data; and thehandling of the resulting data for “on demand” client use. Moreover,because of improved data aggregation capabilities, the DBMS of thepresent invention can be employed in a wide range of applications,including Data Warehouses supporting OLAP systems and the like. Forpurposes of illustration, initial focus will be accorded to the DBMS ofthe present invention. Referring now to FIG. 19 through FIG. 21, thepreferred embodiments of the method and system of the present inventionwill be now described in great detail herein below.

[0230] Through this document, the term “aggregation” and“pre-aggregation” shall be understood to mean the process of summationof numbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc. It shall be understood thatpre-aggregation operations occur asynchronously with respect to thetraditional query processing operations. Moreover, the term “atomicdata” shall be understood to refer to the lowest level of datagranularity required for effective decision making. In the case of aretail merchandising manager, atomic data may refer to information bystore, by day, and by item. For a banker, atomic data may be informationby account, by transaction, and by branch.

[0231]FIG. 19A illustrates the primary components of an illustrativeembodiment of an DBMS of the present invention, namely: supportmechanisms including a query interface and query handler; a relationaldata store including one or more tables storing at least the atomic data(and possibly summary tables) and a meta-data store for storing adictionary (sometimes referred to as a catalogue or directory); and anMDD Aggregation Module that stores atomic data and aggregated data in aMDDB. The MDDB is a non-relational data structure—it uses other datastructures, either instead of or in addition to tables—to store data.For illustrative purposes, FIG. 19A illustrates an RDBMS wherein therelational data store includes fact tables and a dictionary. It shouldbe noted that the DBMS typically includes additional components (notshown) that are not relevant to the present invention. The queryinterface and query handler service user-submitted queries (in thepreferred embodiment, SQL query statements) forwarded, for example, froma client machine over a network as shown. The query handler andrelational data store (tables and meta-data store) are operably coupledto the MDD Aggregation Module. Importantly, the query handler andintegrated MDD Aggregation Module operate to provide for dramaticallyimproved query response times for data aggregation operations anddrill-downs. Moreover, it is an object of the present invention is tomake user-querying of the non-relational MDDB no different than queryinga relational table of the DBMS, in a manner that minimizes the delaysassociated with queries that involve aggregation or drill downoperations. This object is enabled by providing the novel DBMS systemand integrated aggregation mechanism of the present invention.

[0232]FIG. 19B shows the primary components of an illustrativeembodiment of the MDD Aggregation Module of FIG. 19A, namely: a basedata loader for loading the directory and table(s) of relational datastore of the DBMS; an aggregation engine for receiving dimension dataand atomic data from the base loader, a multi-dimensional database(MDDB); a MDDB handler and an SQL handler that operate cooperativelywith the query handler of the DBMS to provide users with query access tothe MDD Aggregation Module, and a control module for managing theoperation of the components of the MDD aggregation module. The base dataloader may load the directory and table(s) of the relational data storeover a standard interface (such as OLDB, OLE-DB, ODBC, SQL, API, JDBC,etc.). In this case, the DBMS and base data loader include componentsthat provide communication of such data over these standard interfaces.Such interface components are well known in the art. For example, suchinterface components are readily available from Attunity Corporation,http://www.attunity.com.

[0233] During operation, base data originates from the table(s) of theDBMS. The core data aggregation operations are performed by theAggregation Engine; a Multidimensional Data (MDDB) Handler; and aMultidimensional Data Storage (MDDB). The results of data aggregationare efficiently stored in the MDDB by the MDDB Handler. The SQL handlerof the MDD Aggregation module services user-submitted queries (in thepreferred embodiment, SQL query statements) forwarded from the queryhandler of the DBMS. The SQL handler of the MDD Aggregation module maycommunicate with the query handler of the DBMS over a standard interface(such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.). In this case, thesupport mechanisms of the RDBMS and SQL handler include components thatprovide communication of such data over these standard interfaces. Suchinterface components are well known in the art. Aggregation (or drilldown results) are retrieved on demand and returned to the user.

[0234] Typically, a user interacts with a client machine (for example,using a web-enabled browser) to generate a natural language query, thatis communicated to the query interface of the DBMS, for example over anetwork as shown. The query interface disintegrates the query, viaparsing, into a series of requests (in the preferred embodiment, SQLstatements) that are communicated to the query handler of the DBMS. Itshould be noted that the functions of the query interface may beimplemented in a module that is not part of the DBMS (for example, inthe client machine). The query handler of the DBMS forwards requeststhat involve data stored in the MDD of the MDD Aggregation module to theSQL hander of the MDD Aggregation module for servicing. Each requestspecifies a set of n-dimensions. The SQL handler of the MDD AggregationModule extracts this set of dimensions and operates cooperatively withthe MDD handler to address the MDDB using the set of dimensions,retrieve the addressed data from the MDDB, and return the results to theuser via the query handler of the DBMS.

[0235] FIGS. 19C(i) and 19C(ii) is a flow chart illustrating theoperations of an illustrative DBMS of the present invention. In step601, the base data loader of the MDD Aggregation Module loads thedictionary (or catalog) from the meta-data store of the DBMS. Inperforming this function, the base data loader may utilize an adapter(interface) that maps the data types of the dictionary of the DBMS (orthat maps a standard data type used to represent the dictionary of theDBMS) into the data types used in the MDD aggregation module. Inaddition, the base data loader extracts the dimensions from thedictionary and forwards the dimensions to the aggregation engine of theMDD Aggregation Module.

[0236] In step 603, the base data loader loads table(s) from the DBMS.In performing this function, the base data loader may utilize an adapter(interface) that maps the data types of the table(s) of the DBMS (orthat maps a standard data type used to represent the fact table(s) ofthe DBMS) into the data types used in the MDD Aggregation Module. Inaddition, the base data loader extracts the atomic data from thetable(s), and forwards the atomic data to the aggregation engine.

[0237] In step 605, the aggregation engine performs aggregationoperations (i.e., roll-up operation) on the atomic data (provided by thebase data loader in step 603) along at least one of the dimensions(extracted from the dictionary of the DBMS in step 601) and operatescooperatively with the MDD handler to store the resultant aggregateddata in the MDDB. A more detailed description of exemplary aggregationoperations according to a preferred embodiment of the present inventionis set forth below with respect to the QDR process of FIGS. 9A-9C.

[0238] In step 607, a reference is defined that provides users with theability to query the data generated by the MDD Aggregation Module and/orstored in the MDDB of the MDD Aggregation Module. This reference ispreferably defined using the Create View SQL statement, which allows theuser to: i) define a table name (TN) associated with the MDDB stored inthe MDD Aggregation Module, and ii) define a link used to route SQLstatements on the table TN to the MDD Aggregation Module. In thisembodiment, the view mechanism of the DBMS enables reference and linkingto the data stored in the MDDB of the MDD Aggregation Engine asillustrated in FIG. 6(E). A more detailed description of the viewmechanism and the Create View SQL statement may be found in C. J. Date,“An Introduction to Database Systems,” Addison-Wesley, Seventh Edition,2000, pp. 289-326, herein incorporated by reference in its entirety.Thus, the view mechanism enables the query handler of the DBMS system toforward any SQL query on table TN to the MDD aggregation module via theassociated link. In an alternative embodiment, a direct mechanism (e.g.,NA trigger mechanism) may be used to enable the DBMS system to referenceand link to the data generated by the MDD Aggregation Module and/orstored in the MDDB of the MDD Aggregation Engine as illustrated in FIG.6F. A more detailed description of trigger mechanisms and methods may befound in C. J. Date, “An Introduction to Database Systems,”Addison-Wesley, Seventh Edition, 2000, pp. 250, 266, herein incorporatedby reference in its entirety.

[0239] In step 609, a user interacts with a client machine to generate aquery, and the query is communicated to the query interface. The queryinterface generate one or more SQL statements. These SQL statements mayrefer to data stored in tables of the relational datastore, or may referto the reference defined in step 607 (this reference refers to the datastored in the MDDB of the MDD Aggregation Module). These SQLstatement(s) are forwarded to the query handler of the DBMS.

[0240] In step 611, the query handler receives the SQL statement(s); andoptionally transforms such SQL statement(s) to optimize the SQLstatement(s) for more efficient query handling. Such transformations arewell known in the art. For example, see Kimball, Aggregation NavigationWith (Almost) No MetaData”, DBMS Data Warehouse Supplement, August 1996,available at http://www.dbmsmag.com/9608d54.html.

[0241] In step 613: the query handler determines whether the receivedSQL statement(s) [or transformed SQL statement(s)] is on the referencegenerated in step 607. If so, operation continues to step 615; otherwisenormal query handling operations continue in step 625 wherein therelational datastore is accessed to extract, store, and/or manipulatethe data stored therein as directed by the query, and results arereturned back to the user via the client machine, if needed. In step615, the received SQL statement(s) [or transformed SQL statement(s)] isrouted to the MDD aggregation engine for processing in step 617 usingthe link for the reference as described above with respect to step 607.

[0242] In step 617, the SQL statement(s) is received by the SQL handlerof the MDD Aggregation Module, wherein a set of one or moreN-dimensional coordinates are extracted from the SQL statement. Inperforming this function, SQL handler may utilize an adapter (interface)that maps the data types of the SQL statement issued by query handler ofthe DBMS (or that maps a standard data type used to represent the SQLstatement issued by query handler of the DBMS) into the data types usedin the MDD aggregation module.

[0243] In step 619, the set of N-dimensional coordinates extracted instep 617 are used by the MDD handler to address the MDDB and retrievethe corresponding data from the MDDB. Finally, in step 621, theretrieved data is returned to the user via the DBMS (for example, byforwarding the retrieved data to the SQL handler, which returns theretrieved data to the query handler of the DBMS system, which returnsthe results of the user-submitted query to the user via the clientmachine), and the operation ends.

[0244] It should be noted that the table data (base data), as it arrivesfrom DBMS, may be analyzed and reordered to optimize hierarchy handling,according to the unique method of the present invention, as describedabove with reference to FIGS. 11A, 11B, and 11C.

[0245] Moreover, the MDD control module of the MDD Aggregation Modulepreferably administers the aggregation process according to the methodillustrated in FIGS. 9A and 9B. Thus, in accordance with the principlesof the present invention, data aggregation within the DBMS can becarried out either as a complete pre-aggregation process, where the basedata is fully aggregated before commencing querying, or as a querydirected roll-up (QDR) process, where querying is allowed at any stageof aggregation using the “on-the-fly” data aggregation process of thepresent invention. The QDR process will be described hereinafter ingreater detail with reference to FIG. 9C. The response to a request(i.e. a basic component of a client query) requiring “on-the-fly” dataaggregation, or requiring access to pre-aggregated result data via theMDD handler is provided by a query/request serving mechanism of thepresent invention within the MDD control module, the primary operationsof which are illustrated in the flow chart of FIG. 6D. The function ofthe MDD Handler is to handle multidimensional data in the storage(s)module in a very efficient way, according to the novel method of thepresent invention, which will be described in detail hereinafter withreference to FIGS. 10A and 10B.

[0246] The SQL handling mechanism shown in FIG. 6D is controlled by theMDD control module. Requests are queued and served one by one. If therequired data is already pre-calculated, then it is retrieved by the MDDhandler and returned to the client. Otherwise, the required data iscalculated “on-the-fly” by the aggregation engine, and the result movedout to the client, while simultaneously stored by the MDD handler, shownin FIG. 6C.

[0247] As illustrated in FIG. 19G, the DBMS of the present invention asdescribed above may be logically partitioned into a relational part anda non-relational part. The relational part includes the relationaldatastore (e.g., table(s) and dictionary) and support mechanisms (e.g.,query handling services). The non-relational part includes the MDDAggregation Module. As described above, bi-directional data flow occursbetween the relational part and the non-relational part as shown. Morespecifically, during data load operations, data is loaded from therelational part (i.e., the relational datastore) into the non-relationalpart, wherein it is aggregated and stored in the MDDB. And during queryservicing operations, when a given query references data stored in theMDDB, data pertaining to the query is generated by the non-relationalpart (e.g., generated and/or retrieved from the MDDB) and supplied tothe relational part (e.g., query servicing mechanism) for communicationback to the user. Such bi-directional data flow represents an importantdistinguishing feature with respect to the prior art. For example, inthe prior art MOLAP architecture as illustrated in FIG. 1B,unidirectional data flows occurs from the relational data base (e.g.,the Data Warehouse RDBMS system) into the MDDB during data loadingoperations.

[0248]FIGS. 20A and 20B outline two different implementations of theDBMS of the present invention. In both implementations, the queryhandler of the DBMS system supplies aggregated results retrieved fromthe MDD to a client.

[0249]FIG. 20A shows a separate-platform implementation of the DBMSsystem of the illustrative embodiment shown in FIG. 19A, wherein therelational part of the DBMS reside on a separate hardware platformand/or OS system from that used to run the non-relational part (MDDAggregation Module). In this type of implementation, it is even possibleto run parts of the DBMS system and the MDD Aggregation Module ondifferent-type operating systems (e.g. NT, Unix, MAC OS).

[0250]FIG. 20B shows a common-platform implementation of the DBMS systemof the illustrative embodiment shown in FIG. 20A, wherein the relationalpart of the DBMS share the same hardware platform and operating system(OS) that is used to run the non-relational part (MDD AggregationModule).

[0251]FIG. 21 shows the improved DBMS (e.g., RDBMS) of the presentinvention as a component of a data warehouse, serving the data storageand aggregation needs of a ROLAP system (or other OLAP systems alike).Importantly, the improved DBMS of the present invention providesflexible, high-performance access and analysis of large volumes ofcomplex and interrelated data. Moreover, the improved Data WarehouseDBMS of the present invention can simultaneously serve many differentkinds of clients (e.g. data mart, OLAP, URL) and has the power ofdelivering an enterprise-wide data storage and aggregation in acost-effective way. This kind of system eliminates redundancy over thegroup of clients, delivering scalability and flexibility. Moreover, theimproved DBMS of the present invention can be used as the data storecomponent of in any informational database system as described above,including data analysis programs such as spread-sheet modeling programs,serving the data storage and aggregation needs of such systems.

[0252]FIG. 22 shows an embodiment of the present invention wherein theDBMS (e.g., RDBMS) of the present invention is a component of a datawarehouse—OLAP system. The DBMS operates as a traditional datawarehouse, serving the data storage and aggregation needs of anenterprise. In addition, the DBMS includes integrated OLAP AnalysisLogic (and preferably an integrated Presentation Module not shown) thatoperates cooperatively with the query handling of the DBMS system andthe MDD Aggregation Module to enable users of the DBMS system to executemultidimensional reports (e.g., ratios, ranks, transforms, dynamicconsolidation, complex filtering, forecasts, query governing,scheduling, flow control, pre-aggregate inferencing, denormalizationsupport, and/or table partitioning and joins) and preferably performtraditional OLAP analyses (grids, graphs, maps, alerts, drill-down, datapivot, data surf, slice and dice, print). Importantly, the improved DBMSof the present invention provides flexible, high-performance access andanalysis of large volumes of complex and interrelated data. Moreover,the improved DBMS of the present invention can simultaneously serve manydifferent kinds of clients (e.g. data mart, other OLAP systems,URL-Directory Systems) and has the power of delivering enterprise-widedata storage and aggregation and OLAP analysis in a cost-effective way.This kind of system eliminates redundancy over the group of clients,delivering scalability and flexibility. Moreover, the improved DBMS ofthe present invention can be used as the data store component of in anyinformational database system as described above, serving the datastorage and aggregation needs of such systems.

[0253] Functional Advantages Gained by the Improved DBMS of the PresentInvention

[0254] The features of the DBMS of the present invention, provides fordramatically improved response time in handling queries issued to theDBMS that involve aggregation, thus enabling enterprise-wide centralizedaggregation. Moreover, in the preferred embodiment of the presentinvention, users can query the aggregated data in an manner no differentthan traditional queries on the DBMS.

[0255] The method of Segmented Aggregation employed by the novel DBMS ofthe present invention provides flexibility, scalability, the capabilityof Query Directed Aggregation, and speed improvement.

[0256] Moreover, the method of Query Directed Aggregation (QDR) employedby the novel DBMS of the present invention minimizes the data handlingoperations in multi-hierarchy data structures, eliminates the need towait for full aggregation to be complete, and provides for buildup ofaggregated data required for full aggregation.

[0257] It is understood that the System and Method of the illustrativeembodiments described herein above may be modified in a variety of wayswhich will become readily apparent to those skilled in the art of havingthe benefit of the novel teachings disclosed herein. All suchmodifications and variations of the illustrative embodiments thereofshall be deemed to be within the scope and spirit of the presentinvention as defined by the claims to Invention appended hereto.

What is claimed is:
 1. A data aggregation module comprising anaggregation engine and a multidimensional datastore, wherein themultidimensional datastore stores multidimensional data logicallyorganized along N dimensions, and wherein the aggregation engineperforms data aggregation operations on said multidimensional data by:(a) performing a first stage of data aggregation operations along afirst dimension of said N dimensions; and (b) performing a second stageof aggregation operations for a given slice in the first dimension alonganother dimension of said N dimensions.
 2. The data aggregation moduleof claim 1, wherein the second stage of aggregation operations involvesrecursive data aggregation operations for slices in said N dimensions.3. The data aggregation module of claim 1, further comprising a dataloading mechanism for loading data from a database, and a storagehandler for storing in the multidimensional datastore the data loadedfrom the database and the aggregated data generated by the aggregationengine.
 4. The data aggregation module of claim 1, further comprising aninterface for receiving queries generated by a requester, and controllogic that, upon determining that the multi-dimensional datastore doesnot contain data required to service a given query, controls theaggregation engine to generate aggregated data required to service thegiven query and controls the aggregation module to return the aggregateddata to the requester.
 5. The data aggregation module of claim 1,wherein said interface interfaces to an OLAP server (comprising OLAPanalysis logic and presentation logic) and client machines operablycoupled to the OLAP server to provide user-directed OLAP analysis, tothereby realize an OLAP system capable of performing data aggregationoperations on the data, and storing and managing such data.
 6. The dataaggregation module of claim 3, wherein said client machines include aweb-browser-based user interface that enables user access to the OLAPserver.
 7. The data aggregation module of claim 1, integral to a DBMS tothereby realize an improved DBMS capable of performing data aggregationoperations on the data, and storing and managing such data.
 8. The dataaggregation module of claim 1, integral to a DBMS operably coupled to aplurality of client machines over a network, to thereby realize a datawarehouse capable of performing data aggregation operations on the data,and storing and managing such data.
 9. The data aggregation module ofclaim 1, wherein said interface implements a standard protocol foraccessing data.
 10. The data aggregation module of claim 9, wherein thestandard protocol comprises one of OLDB, OLE-DB, ODBC, SQL, and JDBC.11. The data aggregation module of claim 1, wherein the aggregationengine stores the resultant data of aggregration operations for thegiven slice as a record in a data file, wherein location of the recordin the data file is stored in a directory.
 12. The data aggregationmodule of claim 11, wherein the directory stores, for a given record, astart address and end address of the record and a physical address ofthe data file.
 13. A method for data aggregation for use with amultidimensional datastore that stores multidimensional data logicallyorganized along N dimensions, the method comprising the steps of: (a)performing a first stage of data aggregation operations along a firstdimension of said N dimensions; and (b) performing a second stage ofaggregation operations for a given slice in the first dimension alonganother dimension of said N dimensions; and (c) storing resultant datain said multidimensional datastore.
 14. The method of claim 13, whereinstep (b) involves recursive data aggregation operations for slices insaid N dimensions.
 15. The method of claim 13, further comprising thestep of loading data from a database, and storing the data loaded fromthe database in said multidimensional datastore.
 16. The method of claim13, further comprising the steps of: receiving queries generated by arequestor; and upon determining that the multi-dimensional datastoredoes not contain data required to service a given query, controlling theaggregation engine to generate aggregated data required to service thegiven query and returning the aggregated data to the requester.
 17. Themethod of claim 16, wherein the queries are received over an interfacethat implements a standard protocol for accessing data.
 18. The methodof claim 17, wherein the standard protocol comprises one of OLDB,OLE-DB, ODBC, SQL, and JDBC.
 19. The method of claim 13, wherein theresultant data of aggregration operations for the given slice is storedas a record in a data file, wherein location of the record in the datafile is stored in a directory.
 20. The method of claim 19, wherein thedirectory stores, for a given record, a start address and end address ofthe record and a physical address of the data file.